An enormous benefit of remote sensing is the capacity to discern different materials based on their spectral reflectance, allowing for the determination of land cover types across large areas. This categorization of data is fundamental for comprehending the distribution and evolution of land cover types, providing valuable insight regarding the impacts of changing environmental conditions under climate change.
While there are multiple approaches to performing land cover classification, this investigation will focus on a supervised approach, specifically a decision tree classifier to assess the land cover in southern Santa Barbara County. A decision tree classifier classifies pixels within the multi-spectral image involves using a training data set, where each pixel in the training set has data on it’s land cover classification and it’s spectral properties. Using this training data, conditions are developed regarding the attributes required for characterization of various land cover types, thus when a applied to a new spectral image, the trained decision tree is able to identify land cover type for the individual pixels.1
As residents of Santa Barbara, California, the purpose of this spectral analysis is to determine the land cover types for southern Santa Barbara County using multi-spectral imagery and location data for the following landcover types :
credit: this lab is based on a materials developed by Chris Kibler.
In order to train our decision tree, the training data is a scene from Landsat 5 collected on September 25, 2007. This scene is a collection 2 surface reflactnace product and contains spectral-data of bands 1, 2, 3, 4, 5, and 7.
The study area of interested is defined by a polygon representing southern Santa Barbara County. The training data file for southern Santa Barbara county contains polygons representing a singular land cover type out of our 4 land cover types of interest: green vegetation, dry grass or soil, urban, and water.
library(sf)
library(terra)
library(here)
library(dplyr)
library(rpart)
library(rpart.plot)
library(tmap)
rm(list = ls())
here::i_am("SBLandCover.Rmd")
setwd(here())
In order to assess the land cover type across southern Santa Barbara county, we must first train a decision tree with training data regarding the spectral-reflectance information characteristic of each landcover type. For this investigation the training data is a scene from Landsat 5 on September 25, 2007. This will be achieved by :
NA and all valid pixels will be updated based on the
scaling factor#loading in landsat data -----------
filelist <- list.files("./data/landsat-data/", full.names = TRUE) #creating a list containing the names all the six spectral band files
landsat <- rast(filelist) #storing bands as a singular raster stack
names(landsat) <- c("blue", "green", "red", "NIR", "SWIR1", "SWIR2") #updating layer names to match the spectral band data they contain
plotRGB(landsat, r = 3, g = 2, b = 1, stretch = "lin") # plot true color image, telling it what bands we would like to put in each color
#loading in study area data -----------
SB_county_south <- st_read("./data/SB_county_south.shp", quiet = TRUE) #reading in the shapefile for southern SB county
SB_county_south <- st_transform(SB_county_south, crs = st_crs(landsat)) #reprojecting study aea polygon to match the CRS of the landsat data
#cropping landsat data to study site ----------
landsat_cropped <- crop(landsat, SB_county_south) #cropping landsat scene to match the extent of the study site
landsat_mask <- mask(landsat_cropped, SB_county_south) #masking the landsat raster to the sourthern region of Santa Barbara county
rm(landsat, landsat_cropped, SB_county_south) # remove unnecessary object from environment so that R does not need to maintain it in its memory
plotRGB(landsat_mask, r = 3, g = 2, blue = 1, stretch = "lin") #plotting to ensure that we have masked to our study site, which contains our training data
#convesting landsat values to reflectance -----------
rcl <- matrix(c(-Inf, 7273, NA,
43636, Inf, NA),
ncol = 3, byrow = TRUE) #creating a reclassification matrix defining the valid pixel range, assigning all pixels outside of this range to be NA
landsat <- classify(landsat_mask, rcl = rcl) #applying reclassification matrix to landat raster
landsat <- (landsat * 0.0000275 - 0.2)*100 #adjusting values based on reflectance scaling factor to get reflectance applying, and multiplying by 100 to get percent reflectance
summary(landsat) #viewing summary to ensure that all values are between 0 - 100, indicated that adjustments were successfully applied
## blue green red NIR
## Min. : 1.11 Min. : 0.74 Min. : 0.00 Min. : 0.23
## 1st Qu.: 2.49 1st Qu.: 2.17 1st Qu.: 1.08 1st Qu.: 0.75
## Median : 3.06 Median : 4.59 Median : 4.45 Median :14.39
## Mean : 3.83 Mean : 5.02 Mean : 4.92 Mean :11.52
## 3rd Qu.: 4.63 3rd Qu.: 6.76 3rd Qu.: 7.40 3rd Qu.:19.34
## Max. :39.42 Max. :53.32 Max. :56.68 Max. :57.08
## NA's :39856 NA's :39855 NA's :39855 NA's :39856
## SWIR1 SWIR2
## Min. : 0.10 Min. : 0.20
## 1st Qu.: 0.41 1st Qu.: 0.60
## Median :13.43 Median : 8.15
## Mean :11.88 Mean : 8.52
## 3rd Qu.:18.70 3rd Qu.:13.07
## Max. :49.13 Max. :48.07
## NA's :42892 NA's :46809
Now that our landsat raster is updated to be a single raster stack containing updated reflectance values for pixels that fall in the allotted range, we must now characterize the percent reflectance values as specific land cover types. This is achieved by:
#loading in training data -----------
training_data <- st_read("data/trainingdata.shp", quiet = TRUE) %>%
st_transform(., crs= st_crs(landsat)) #reading in training data shape file and reprojecting it to match the crs of the landsat raster
training_data_attributes <- training_data %>%
st_drop_geometry() #converting this training data into a data frame
#extracting reflectance values for the training data -----------
training_data_values <- terra::extract(landsat, training_data, df = TRUE) #extracting landsat reflectance values at each of the training site polygons
#joining training data and extracted reflectance dataframe ------------
SB_training_data <- left_join(training_data_values, training_data_attributes,
by = c("ID" = "id")) %>%
mutate(type = as.factor(type)) #joining the training data attribute dataframe to the dataframe containing extracted reflectance values, and converting landcover type to a factor
Once we have created a training dataframe containing land cover types
and their associated percent spectral reflectance values, we can use
this dataframe to train the decision tree. In order to train a decision
tree, a model formula must be established, indicating what our response
and predictor variables are. In this case, our reponse is the land cover
type, and our predicator variables are the different spectral band
values. Once we have established a model formula, we can use an
rpart function, defining the model formula, dataset, and
method, to perform a classification.
#training decision tree ------------
SB_formula <- type ~ red + green + blue + NIR + SWIR1 + SWIR2 #defining model formula
SB_decision_tree <- rpart(formula = SB_formula,
data = SB_training_data,
method = "class",
na.action = na.omit) #training decision tree using rpart package, setting the method to classification and telling it to omit any pixels valued NA from our analysis
prp(SB_decision_tree) #plotting decision tree to ensure training was successful
The trained decision tree can now be applying to our entire image of
southern Santa Barbara county, extending beyond the region outlined in
our training sites, to classify land cover types throughout southern SB
county. To apply the decision tree, the predict function
within the terra package will be utilized. The output of
this classification will be a raster layer with integer value, where
each value corresponds to the factor level in the training
data, representing a specific land cover classification.
#applying decision tree to entire image -----------
SB_classification <- predict(landsat, SB_decision_tree, type = "class", na.rm = TRUE) #telling predict function to predict our landsat scene data based on the trained decision tree, and telling the function to perdict by classification
levels(SB_classification) #insepcting levels of classified output of decision tree to investigate which integer values correspond with which land cover type
## [[1]]
## value class
## 1 1 green_vegetation
## 2 2 soil_dead_grass
## 3 3 urban
## 4 4 water
Upon applying the decision tree to the original landsat image, a raster is outputted containing the land cover classifications for all of southern Santa Barbara county. We can now plot this raster create a land cover map depicting the land cover classifications across southern SB!
#plotting decision tree output -----------
tm_shape(SB_classification)+ #mapping land cover classification across southern Santa Barbara county
tm_raster(col.scale = tm_scale_categorical(values = c("#8DB580", "#F2DDA4", "#7E8987", "#6A8EAE")), #changing color palette to logicallt match landcover type
col.legend = tm_legend(title = "Landcover Type"))+
tm_layout(title = "Land Cover Classification in Southern Santa Barbara County", title.position = c("right", "bottom"))
Global Land Analysis and Discovery. (n.d.). Land Cover Classification. https://glad.umd.edu/ard/land-cover-classification↩︎
United States Geological Survey. (n.d.). How do I use a scale factor with landsat level-2 science products?. United States Geological Survey | science for a changing world. https://www.usgs.gov/faqs/how-do-i-use-a-scale-factor-landsat-level-2-science-products#:~:text=Landsat%20Collection%202%20surface%20reflectance,offset%20of%20%2D0.2%20per%20pixel.↩︎